Reducing Recovery Time in a Small Recursively Restartable System

نویسندگان

  • George Candea
  • James W. Cutler
  • Armando Fox
  • Rushabh Doshi
  • Priyank Garg
  • Rakesh Gowda
چکیده

We present ideas on how to structure software systems for high availability by considering MTTR/MTTF characteristics of components in addition to the traditional criteria, such as functionality or state sharing. Recursive restartability (RR), a recently proposed technique for achieving high availability, exploits partial restarts at various levels within complex software infrastructures to recover from transient failures and rejuvenate software components. Here we refine the original proposal and apply the RR philosophy to Mercury, a COTS-based satellite ground station that has been in operation for over 2 years. We develop three techniques for transforming component group boundaries such that time-to-recover is reduced, hence increasing system availability. We also further RR by defining the notions of an oracle, restart group and restart policy, while showing how to reason about system properties in terms of restart groups. From our experience with applying RR to Mercury, we draw design guidelines and lessons for the systematic application of recursive restartability to other software systems amenable to RR.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Designing for High Availability and Measurability

We propose a structuring model, called recursive restartability, aimed at controlling the amount of endto-end unavailability and improving the measurability of software infrastructures with high availability requirements. Recursive restartability exploits the benefits of restarts at various levels within complex software systems and relies on an execution infrastructure to monitor, cure, and re...

متن کامل

An Implementation of User-level Restartable Atomic Sequences on the NetBSD Operating System

This paper outlines an implementation of restartable atomic sequences on the NetBSD operating system as a mechanism for implementing atomic operations in a mutual-exclusion facility on uniprocessor systems. Kernel-level and user-level interfaces are discussed along with implementation details. Issues associated with protecting restartable atomic sequences from violation are considered. The perf...

متن کامل

Discrete Time Analysis of Multi-Server Queueing System with Multiple Working Vacations and Reneging of Customers‎

This paper analyzes a discrete-time $Geo/Geo/c$ queueing system with multiple working vacations and reneging in which customers arrive according to a geometric process. As soon as the system gets empty, the servers go to a working vacations all together. The service times during regular busy period, working vacation period and vacation times are assumed to be geometrically distributed. Customer...

متن کامل

Fast Mutual Exclusion for Uniprocessors Brian

In this paper we describe restartable atomic sequences, an optimistic mechanism for implementing simple atomic operations (such as Test-And-Set) on a uniprocessor. A thread that is suspended within a restartable atomic sequence is resumed by the operating system at the beginning of the sequence, rather than at the point of suspension. This guarantees that the thread eventually executes the sequ...

متن کامل

Evaluating the Recovery Process of Renal Ischemia/Reperfusion Injury in Rats Using Small-Animal SPECT

Background: Renal injuries associated with ischemia/reperfusion are a prevalent clinical phenomenon that can cause the emergence of progressive kidney diseases, eventually leading to chronic kidney injuries. The present study was conducted to evaluate the results obtained from non-invasive imaging using small-animal SPECT and investigate the recovery process in an animal model of renal ischemia...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002